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METHOD FOR AUTOMATICALLY 
PROVIDING A COMPRESSED RENDITION 

OF A VIDEO PROGRAM IN A FORMAT 
SUITABLE FOR ELECTRONIC SEARCHING 
AND RETRIEVAL 

TECHNICAL FIELD 

This invention relates generally to a method for automati- 
cally providing a compressed rendition of a video program 
in a format suitable for electronic searching and retrieval, 
and more particularly to a method for providing a com- 
pressed rendition of a video program in a format suitable for 
electronic searching and retrieval on the World Wide Web. 

BACKGROUND 

The rapid growth of the World Wide Web began with the 
development of an on-line browser having a graphical user 
interface. Graphical interfaces provide a number of impor- 
tant advantages, including the ability to rapidly scroll 
through a document to get to a particular point of interest 
Moreover, the ability to interact with a medium other than 
text (i.e. images or audio) increases the rate at which 
information can be conveyed since an image often conveys 
an idea faster and more efficiently than text. 

While graphical browsers provide an adequate interface 
for text and images, they provide an inadequate interface for 
video programs. The sequential nature of the video and 
audio components of a video program impedes rapid access 
to such programs on the World Wide Web by graphical 
browsers. Furthermore, because of the limited bandwidth of 
networks supporting the World Wide Web, and particularly 
the limitations of most users' connections to such networks, 
it takes a long time to transmit a program with its full 
content. For example, at a connection speed of 28,800 bits 
per second, it could take up to about 45 minutes to transmit 
even a three or four minute audiovisual segment with sound 
and full-motion video. As a result, video program providers 
sometimes form a compressed version of the video program 
by manually extracting and retaining selected frames from 
the program while other frames are discarded. The selected 
frames and accompanying text, typically taken from a tran- 
script of the program, result in a document that may subse- 
quently be made available over the World Wide Web. 
However, the generation of this document is typically a 
tedious and time consuming task since it must be created by 
a manual process. 

Accordingly, it would be advantageous to provide a 
rendition of a video program which can be automatically 
generated and which allows easy interaction with graphical 
browsers with a minimum of information loss. 

SUMMARY -OF THE INVENTION 

The present inventors have realized that a pictorial tran- 
script representation of a video program is particularly well 
suited for on-line searching and retrieving applications such 
as browsing on the World Wide Web. Pictorial transcripts are 
compact representations of video programs which are auto- 
matically generated by selecting representative frames or 
images from the video program and combining them with a 
second media component such as audio or text which is 
associated with each representative frame. Properly chosen, 
the representative frames convey a substantial portion of the 
information content of the original video program. 
Moreover, pictorial transcripts may be generated in an 
automatic fashion, thus eliminating the substantial time and 
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effort that was previously required to place a document of 
this type on the World Wide Web. 

The inventive method provides a compressed rendition of 
a video program in a format suitable for electronic searching 

5 and retrieval. An electronic pictorial transcript representa- 
tion of the video program is initially received. The video 
program has a video component and a second information- 
bearing media component associated therewith. The picto- 
rial transcript representation includes a representative frame 

10 from each segment of the video component of the video 
program and a portion of the second media component 
associated with the segment. The electronic pictorial tran- 
script is transformed into a hypertext format to form a 
hypertext pictorial transcript. The hypertext pictorial tran- 

15 script is subsequently recorded in an electronic medium. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is an example of one page of a printed pictorial 
20 transcript generated from a television news program in 
accordance with method of the present invention. 

FIG. 2 illustrates the use of server push for viewing an 
HTML pictorial transcript. 

FIG. 3 shows an example of a page format that may be 
25 employed when performing keyword searching. 

FIG. 4 shows an example of an index that may be 
generated for HTML pictorial transcripts. 

DETAILED DESCRIPTION 

30 

A method for automatically compressing multimedia data 
is disclosed in U.S. patent application Ser. No. 08/252,861, 
filed Jun. 2,1994, pending and Shahraray B,, and Gibbon D. 
C "Automatic Generation of Pictorial Transcripts of Video 

35 Programs," in Multimedia Computing and Networking 1 995 , 
Proc. SPIE 2417, Feb. 1995, the latter reference being 
hereby incorporated by reference. In accordance with this 
known method, a video program is compressed by selecting 
certain frames from the entire sequence of frames to serve as 

40 representative frames. For example, a single frame may be 
used to represent the visual information contained in any 
given scene of the video program. A scene may be defined 
as a segment of the video program over which the visual 
contents do not change significantly. Thus, a frame selected 

45 from the scene may be used to represent the entire scene 
without losing a substantially large amount of information. 
A series of such representative frames from all the scenes in 
the video program provides a reasonably accurate represen- 
tation of the entire video program with an acceptable degree 

50 of information loss. These compression methods in effect 
perform a content-based sampling of the video program. 
Additional information may be found in B. Shahraray, 
"Scene Change Detection and Content-Based Sampling of 
Video Sequences," Digital Video Compression: A Igorit Urns 

55 and Technologies 1995, SPIE 2419. 

In the previously cited documents, a plurality of repre- 
sentative frames are selected by sampling the video program 
in a content-based manner to retain a single representative 
frame from each scene. While the series of frames selected 

60 in this manner may not contain all the visual information in 
the original video program, when combined with another 
medium that was a part of the original video program, such 
as audio or closed-captioned text, the resulting multimedia 
program adequately conveys the information content of the 

65 video program in a condensed format To generate this 
condensed multimedia program, a correspondence must be 
formed between the representative frames and the audio or 
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textual medium. For example, each representative frame 
should be associated with the portion of the audio or textual 
medium corresponding to the entire scene from which the 
representative frame was selected. This correspondence may 
be accomplished in a relatively simple manner because in 
the original video program the video medium is already 
synchronized with the audio or textual information. Addi- 
tional details concerning the formulation of this correspon- 
dence may be found in the previously cited references. 

The representative frames, the audio or textual compo- 
nents associated therewith, and the correspondence between 
the representative frames and the audio or textual compo- 
nents comprise electronic data representing a condensed 
version of a video program, which hereinafter will be 
referred to as the condensed electronic data. 

In the case of closed-captioned text, a printed rendition of 
the condensed electronic data may be provided. The printed 
rendition constitutes a so-called pictorial transcript in which 
each representative frame is printed with a caption contain- 
ing the portion of the closed-caption text corresponding to 
the scene from which that representative frame is taken. 
FIG. 1 is an example of one page of printed pictorial 
transcript generated from a television news program. 
Alternatively, rather than printing the condensed electronic 
data as a pictorial transcript, the data simply may be elec- 
tronically stored for subsequent retrieval. Thereafter the data 
may be printed, displayed on a computer, or transmitted in 
any desired format. 

In addition, the condensed electronic data may be gener- 
alized further to refer to the series of representative frames 
and the audio segments corresponding thereto rather than 
closed-caption segments. In this case the condensed elec- 
tronic data may be conveniently stored electronically and 
then displayed by sequentially displaying the representative 
frames and, simultaneous with each displayed frame, play- 
ing the corresponding audio segment. 

In accordance with the present invention, electronic data 
representing a condensed version of a video program is 
formatted in hypertext markup language (HTML) so that the 
resulting HTML document is compatible with the World 
Wide Web. HTML documents refer to on-line documents 
having words or graphics that contain links to other on-line 
documents. Such documents are commonly referred to as 
hypertext documents. By selecting the link (using a mouse 
or key command) the user is connected to another document 
that may be located on the same or a different computer. It 
should be noted that while the present invention is described 
in terms of an on-line document formatted in HTML, more 
generally the present invention is applicable to hypertext 
documents formatted in languages other than HTML, such 
as hypercard, for example. 

An HTML document is automatically produced from the 
condensed electronic data by an HTML generator, which 
converts the data into an HTML document. Procedures to 
implement such a generator are well known. As used 
hereinafter, the terms HTML document and HTML pictorial 
• transcript refer to the condensed electronic data that is 
formatted in HTML The HTML document or pictorial 
transcript may be composed of individual records connected 
by links. The individual records of the HTML document or 
pictorial transcript are referred to as pages. 

The HTML pictorial transcript may be advantageously 
divided over two or more HTML pages, depending on the 
size of the document. An HTML document consisting of 
only a single HTML page is impractical for all but the 
shortest programs (e.g., less than ten minutes in length) 
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because WWW browsers, which sometimes lack parallel 
loading capability, begin to exhibit unacceptable delays. In 
fact, even browsers having parallel loading capability such 
as Netscape will often be taxed. The size of each HTML 

5 page may be determined in any appropriate manner. For 
example, the HTML generator may begin a new page after 
a predetermined number of images (e.g., 25) have been 
placed on a single page. Alternatively, the pages may be 
divided on the basis of story and topic based segmentation. 

1Q The various pages comprising the HTLM document may be 
connected by hypertext links. 

A graphical browser is a graphical interface that can 
access documents on the WWW in an HTML format. The 
HTML pictorial transcript may be conveniently accessed 
and searched using conventional graphical browsers such as 

15 Mosaic, Spry and Explorer, for example. 

The HTML pictorial transcript may be displayed in a 
variety of different formats. The user may have the option of 
selecting among several predetermined formats, or 
alternatively, the user may customize a format via the web 

20 browser. The server, in turn, re-executes the HTML genera- 
tor routine, which now produces the HTML document in the 
desired format. Additionally, if no selection is made, the 
HTML transcript may be displayed in a default format 
(which may be one of the standard formats). In some 

25 embodiments of the invention, the user may be provided 
with a plurality of different default formats from which to 
choose. 

In one embodiment of the invention, a standard or default 
format displays an HTML pictorial transcript that is the 

30 equivalent of the printed rendition of a pictorial transcript 
such as shown in FIG. 1. Other formats may modify this 
particular format to reduce retrieval time and improve page 
layout. For example, some formats may be employed to 
reduce the required bandwidth by displaying only a subset of 

35 the representative frames contained in the HTML pictorial 
transcript. Many different criteria may be employed to 
determine which representative frames to retain and which 
to omit. 

One criterion that may used to eliminate select represen- 

40 tative frames is based on the presence of redundant frames. 
For example, if the original program contains a shot of a 
given scene at one time and subsequently contains substan- 
tially the same scene after one or more other scenes have 
intervened, the resulting pictorial transcript will contain two 

45 representative frames that are substantially the same. 
Accordingly, one of the redundant representative frames 
may be eliminated to reduce bandwidth. In the resulting 
HTML pictorial transcript it may be desirable to use a 
hypertext link in place of the second appearance of the 

50 redundant representative frame which links back to the first 
appearance of the representative frame. 

Other criteria that may used to eliminate select represen- 
tative frames are based on random subsampling (e.g., retain 
every other representative frame) or, alternatively, the size of 

55 the JPEG image file. For example, it may be desirable to 
retain only the largest of the image files on the assumption 
that image size is correlated with the complexity of the 
image. More complex images typically convey more infor- 
mation. Conversely, it may be desirable to retain only the 

60 smallest of the image files to further minimize bandwidth 
requirements. Alternatively, it may be advantageous to retain 
only representative images that differ from one another by 
more than a prescribed amount, as determined by scene 
matching techniques. The representative images that are 

65 eliminated in this manner may be replaced by hypertext 
anchors linked to the similar representative images that were 
retained. 



